Add Windows Ninja Multi-Config build with sccache caching#248
Merged
Conversation
…dows
The two Windows native build jobs are the only uncached native builds: the
Visual Studio generator ignores CMAKE_{C,CXX}_COMPILER_LAUNCHER, so sccache
cannot be wired into them. Upstream llama.cpp ships its windows-cuda artifact
with Ninja Multi-Config + MSVC, proving the combination works on the same tree.
- build.bat: add an sccache probe guard mirroring build.sh's
sccache_can_wrap_compiler() — when USE_CACHE=true and sccache is on PATH,
compile a trivial TU through `sccache cl.exe`; only on success pass
-DCMAKE_{C,CXX}_COMPILER_LAUNCHER=sccache and print `sccache --show-stats`.
A missing/crashing sccache falls back to a green uncached build. Inert for
the VS generator jobs (they do not set USE_CACHE).
- publish.yml: add two evaluation-only jobs, build-windows-x86_64-ninja and
build-windows-x86-ninja (windows-2025-vs2026, ilammy/msvc-dev-cmd@v1,
sccache v0.16.0, Depot WebDAV env, `build.bat -G "Ninja Multi-Config"`).
Artifacts are named Windows-*-ninja (NOT *-libraries) so the package job's
`pattern: "*-libraries"` does not consume them; package's `needs:` is
unchanged. They run in parallel with the trusted VS jobs.
- TODO.md: flip the Windows cache section from "implementation notes" to
"evaluation in progress"; document what remains (confirm cache hits, then
rename artifacts to *-libraries, wire into package `needs`, retire VS jobs).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…ermanent MSVC
Builds on the prior Ninja evaluation jobs. Per owner decision, the MSVC /
Visual Studio Windows build is the default JAR and is kept permanently (the
sccache cache loss on it is accepted); the Ninja Multi-Config build is shipped
ALONGSIDE it as a separate classifier JAR, never as a replacement. The two
generators produce different jllama.dll files, so they cannot share a resource
path in one JAR — hence the classifier (mirrors the cuda / opencl-android
pattern). Result: 4 permanent Windows build jobs, both generators distributed
and tested end-to-end.
- pom.xml: add `windows-ninja` profile producing a <classifier>ninja-windows</classifier>
JAR from ${outputDirectory}_windows_ninja (separate compile pass + resource
copy + classified jar; mirrors cuda / opencl-android).
- publish.yml: the package, publish-snapshot, and publish-release jobs download
Windows-{x86_64,x86}-ninja into src/main/resources_windows_ninja/ and activate
the `windows-ninja` profile (-P ...,windows-ninja). Add a
test-java-windows-x86_64-ninja job that loads the Ninja DLL via JNI and runs
the full model-backed suite (parity with test-java-windows-x86_64). Wire the
Ninja build + Java-test jobs into the package `needs:` graph.
- .gitignore: ignore src/main/resources_windows_ninja/ (CI-staged, never committed).
- README.md: add the `ninja-windows` classifier row + dependency snippet.
- CLAUDE.md: add "Windows Ninja artifact" section; refresh the sccache "Windows"
note (no CMakeLists change — routing is CI-download + pom-profile, not a GGML flag).
- TODO.md: rewrite the Windows section to the final dual-build design (MSVC kept
forever; remaining work is cache-hit verification, not a redesign).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…fix) The b9739 upgrade (#247) added tools/server/server-schema.cpp to the jllama library target but not to the jllama_test executable target. server-schema.cpp defines server_schema::eval_llama_cmpl_schema(), which test_server.cpp and the upstream server-context.cpp reference, so the test link failed with an undefined/unresolved external symbol on every platform that builds the tests (Linux x86_64, macOS, Windows MSVC, Windows Ninja). The shared library itself links fine — only jllama_test was missing the TU. Add server-schema.cpp to the jllama_test source list, mirroring its presence in the library target. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…behavior
b9739's server-schema validation changed two behaviors that the existing tests
encoded from an earlier llama.cpp:
- negative n_discard is now range-checked (0 <= value <= INT32_MAX) and throws,
instead of being silently clamped to 0; and
- every field-validation failure is wrapped in std::invalid_argument
("Field '<name>': ...", server-schema.cpp) rather than surfacing the inner
std::runtime_error.
Update the four affected assertions (NDiscard negative now expects a throw;
dry_sequence_breakers / lora / repeat_last_n expect std::invalid_argument) and
the descriptive comment block. With the prior server-schema.cpp test-link fix,
the full C++ suite passes locally (446/446).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…neW) regression Root-cause write-up only (fix deferred per discussion). On Windows x86_64, b9739's common_params_parse has a _WIN32 prologue that replaces the caller-supplied argv with GetCommandLineW() (the host process command line). Correct for the standalone llama-server.exe, but under JNI the process is java.exe (no --model on its command line), so the parse fails with "Failed to parse model parameters" for every model-loading Java test. Affects both the MSVC and Ninja DLLs; Linux/macOS unaffected. Documents symptom, exact upstream code (arg.cpp:924-931), why our argv is platform-neutral, and three fix options to choose from later. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…cross LTS) The dockcross linux-arm64-lts image ships GCC 8.5 / glibc 2.17, which can no longer compile llama.cpp b9739 (its C++17 CTAD-in-`new` needs GCC >= 12). Mirror upstream llama.cpp's own aarch64 release job: build natively on GitHub's free ubuntu-24.04-arm runner with GCC 14. - publish.yml: rewrite crosscompile-linux-aarch64 as a native build (setup-java -> mvn compile -> build.sh) with -DGGML_NATIVE=OFF for ARMv8 portability, and run ctest on real ARM hardware for the first time. Job id kept (downstream needs:); display name -> "Build and Test Linux aarch64". - build.sh: generalize the Linux sccache auto-fetch from x86_64-only to map uname -m -> x86_64/aarch64 static-musl release (x86_64 path unchanged). - CLAUDE.md: document the native build and the glibc floor change (2.17 -> ~2.39, matching upstream's ARM binaries). Trade-off: the aarch64 artifact's glibc floor rises to ~2.39 (same envelope upstream's own ARM binaries require); x86_64 stays at glibc 2.17. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
The default CPU JAR's Linux aarch64 native is now built natively on ubuntu-24.04-arm (mirroring upstream), so it requires glibc >= 2.39. Document this in the "Choosing the right classifier" runtime-requirement column so users on older-glibc ARM hosts know to expect a load failure. Linux x86-64 stays at glibc 2.17 (manylinux2014). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…#24779) Introduce a generic source-patch mechanism for the FetchContent'd llama.cpp tree so fixes apply to every C++ build (all CI jobs + local) from one place: - patches/ : drop *.patch / *.diff here (applied in filename order) - cmake/apply-llama-patches.cmake : cross-platform (cmake -P), idempotent (git apply --reverse --check skips already-applied), fail-loud (a stale patch aborts configure so it can't be silently dropped from a release build) - CMakeLists.txt : wired as the llama.cpp FetchContent PATCH_COMMAND First patch, 0001-win32-arg-parse-embed-guard.patch, fixes the b9739 Windows JNI regression from upstream #24779: common_params_parse unconditionally replaced the caller's argv with the process command line (GetCommandLineW), so an embedded java.exe lost its --model args -> "Failed to parse model parameters". The guard adopts the process command line only when the re-derived arg count equals argc: true for the standalone llama-* tools (UTF-8 CLI fix preserved), false for a JVM host (our already-UTF-8 argv kept). Verified the patch applies cleanly to b9739 and the applier is idempotent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…make The REUSE license-compliance check (fsfe/reuse-action) failed on the new files: - cmake/apply-llama-patches.cmake had the license but no copyright -> add inline SPDX-FileCopyrightText (mirrors build.sh). - patches/*.patch is unified-diff format and cannot carry an inline header without corrupting the diff -> annotate via REUSE.toml with a patches/** glob, so every current and future patch in the directory is covered automatically. Verified locally: `reuse lint` -> compliant with REUSE 3.3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
The count-guard variant of 0001-win32-arg-parse-embed-guard.patch fixed 21/25 Windows Java tests but collided on the 4 server-integration setups (Rerank, ToolCalling, Multimodal, OpenAiCompatServer) whose argv length happened to equal java.exe's process arg count — the guard then wrongly adopted java.exe's command line and they kept failing with "Failed to parse model parameters". Those tests pass on Linux/macOS, so their args are valid; the collision was the only cause. Switch to the deterministic fix: keep the make_utf8_argv() call referenced (no -Wunused-function) but never adopt its result, so the caller's already-UTF-8 JNI argv is always used. A JNI library is never the process, so the GetCommandLineW override is pure liability for us. Verified the patch applies cleanly to b9739 and the applier stays idempotent. CLAUDE.md + TODO.md updated to record the count-guard -> removal change; upstream PR can still expose an opt-out that preserves the standalone tools' UTF-8 fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
… Rating) SonarCloud's GitHub Actions analyzer flagged these (they drive the "Security Rating on New Code = C" gate — the "C" is the rating grade, not the C language; the scan is the Maven scanner over YAML/Java, not CFamily): - clang-format.yml: pip install without --only-binary :all: can execute a source package's setup.py at install time. Force wheels-only (clang-format ships manylinux wheels). This is the only finding in the new-code window, so it is what fails the gate. Uses a block scalar so ":all:" doesn't break YAML. - osv-scanner.yml / scorecard.yml: tighten top-level "permissions: read-all" to "contents: read". Safe — every job in both files already declares its own exact permissions (security-events: write, id-token: write, etc.), which fully override the workflow default. Not touched: publish.yml's workflow-level "contents: read" (Sonar wants it at job level) — Low severity, not in the new-code window (doesn't affect the gate), already owner-assigned, and spreading it across ~25 jobs is invasive on the release pipeline. Left for a separate decision. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
SonarCloud flagged three Critical Reliability bugs in PairTest: assertNotNull on pair.hashCode(), which returns a primitive int that can never be null, so the assertions verified nothing. Replace them with a deterministic-hashCode check (capture the value, assert a repeated call returns the same int) — this exercises the real "doesn't throw + consistent across calls" intent. A local variable is used so the two assertEquals arguments are distinct expressions (avoids the identical-argument rule java:S5863). These are Reliability bugs, separate from the failing Security-Rating gate; fixing them improves the Reliability rating. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
- Add open items: SonarCloud "Security Rating on New Code" gate status (GHA findings fixed in 84297e0, publish.yml owner-accepted, PairTest reliability bugs fixed in 9f0d377, gate pending the last dashboard finding); the upstream llama.cpp PR to drop the local Windows arg-parse patch; the aarch64 branch- protection rename. - Mark resolved: Windows-JNI arg-parse regression (patch option 2 chosen); Windows Ninja verification (green + sccache hits). - Add Done section for b9739 upgrade, Windows Ninja classifier, native ubuntu-24.04-arm aarch64 build, the generic patches/ mechanism, and CUDA sccache verification. - Fix stale b9682 -> b9739 references (pin is b9739). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
The dd264b2 scaffold (NativeServer.java + NativeServerSmokeTest) landed the native-HTTP-transport Java entry point, but TODO.md tracked no follow-up for wiring the upstream server.cpp route table to JNI. Add it under the server follow-ups so the remaining native-server work is visible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
…248 item The combined commit status on #248 shows a "License Compliance" check failing with "17 issues found" — a dependency-license scanner app, separate from REUSE (green) and SonarCloud. Document it as open, note it is almost certainly pre-existing (the PR changes no dependencies), and how to triage it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
ninja-windowsclassifier JAR, alongside the existing MSVC/Visual Studio build (which remains the default).build-windows-x86_64-ninja,build-windows-x86-ninja) that build with Ninja, cache via sccache + Depot WebDAV, and run C++ unit tests.test-java-windows-x86_64-ninjato validate the Ninja-built DLL end-to-end via JNI..github/build.bat(mirrorsbuild.sh) to safely enable the compiler launcher only when sccache is present and working.windows-ninjaMaven profile to produce the classifier JAR from a separate resource tree (src/main/resources_windows_ninja/).package,publish-snapshot, andpublish-releasejobs to download and activate the Ninja build.Rationale
The Visual Studio generator ignores
CMAKE_{C,CXX}_COMPILER_LAUNCHER, so the MSVC Windows jobs cannot use sccache caching. Rather than switch the trusted MSVC build, this change builds the same CPU natives a second time with Ninja Multi-Config (which does honor the launcher) and ships them as an optional classifier JAR. The MSVC build is kept as the permanent default; the Ninja artifact is an additional, cache-accelerated, independently validated option for users to compare/adopt. Upstream llama.cpp ships itswindows-cudaartifact with Ninja Multi-Config + MSVC, proving the combination works on the same tree.Test plan
ctest) and Java model-backed testsREADME.mdclassifier table,CLAUDE.md"Windows Ninja artifact" section,TODO.mdstatusRelated issues / PRs
Closes the Windows compiler cache task in
TODO.md(deferred → shipped as dual-artifact design).Checklist
CONTRIBUTING.mdandCODE_OF_CONDUCT.mdhttps://claude.ai/code/session_01SfvSZ76NW4e1qX1PjL4RKq